Text Segmentation Using Roget-Based Weighted Lexical Chains
نویسندگان
چکیده
In this article we present a new method for text segmentation. The method relies on the number of lexical chains (LCs) which end in a sentence, which begin in the following sentence and which traverse the two successive sentences. The lexical chains are based on Roget’s thesaurus (the 1987 and the 1911 version). We evaluate the method on ten texts from the DUC 2002 conference and on twenty texts from the CAST project corpus, using a manual segmentation as gold standard.
منابع مشابه
Segmenting Broadcast News Streams using Lexical Chains
In this paper we propose a course-grained NLP approach to text segmentation based on the analysis of lexical cohesion within text. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our sy...
متن کاملLexical Chains Using Distributional Measures of Concept Distance
In practice, lexical chains are typically built using term reiteration or resource-based measures of semantic distance. The former approach misses out on a significant portion of the inherent semantic information in a text, while the latter suffers from the limitations of the linguistic resource it depends upon. In this paper, chains are constructed using the framework of distributional measure...
متن کاملWord Sense Disambiguation and Text Segmentation Based on Lexical Cohesion
In this paper, we describe ihow word sense am= biguity can be resolw'.d with the aid of lexical eo-hesion. By checking ]exical coheshm between the current word and lexical chains in the order of the salience, in tandem with getmration of lexica] chains~ we realize incretnental word sense disam biguation based on contextual infl)rmation that lexical chains,reveah Next;, we <le~<:ribe how set men...
متن کاملSemantic Feature Structure Extraction from Documents Based on Extended Lexical Chains
The meaning of a sentence in a document is more easily determined if its constituent words exhibit cohesion with respect to their individual semantics. This paper explores the degree of cohesion among a document's words using lexical chains as a semantic representation of its meaning. Using a combination of diverse types of lexical chains, we develop a text document representation that can be u...
متن کاملUsing Information Density to Navigate the Web
This paper describes a system being developed to identify Internet WWW pages that most closely respond to a users' requirements. The system is designed to enhance, rather than replace existing search engines. It collects pages identified by the search engine, and then uses an external lexical thesaurus to analyse their contents. This provides a secondary ordering metric for the pages based on a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computing and Informatics
دوره 32 شماره
صفحات -
تاریخ انتشار 2013